Ai synaptic coprocessor

ABSTRACT

A coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.

PRIORITY APPLICATION(S)

This is a continuation application based upon U.S. patent applicationSerial No. 17/242,374 filed Apr. 28, 2021, which is based upon U.S.Provisional Pat. Application Serial No. 63/124,923, filed Dec. 14, 2020,the disclosures which are hereby incorporated by reference in theirentirety.

FIELD OF THE INVENTION

The present invention relates to the field of computers, and moreparticularly, this invention relates to coprocessors used withcomputers, such as for artificial intelligence applications.

BACKGROUND OF THE INVENTION

Artificial intelligence applied to many computer applications has grownin recent years and placed demands on the computational power of normalprocessors. For example, processor speeds have almost reached a maximumat about 4 GHz, ending the gains that have been reached throughincreased clock speeds as transistor dimensions shrink based uponMoore’s law. As semiconductor technology advances and gate lengthsdecrease, greater numbers of gates are placed on one chip, often morethan 10 billion gates per chip. It is becoming increasingly difficult toplace even greater numbers of gates on chips. One approach is to placemore processors on each chip, but this requires partitioning theprocessing workload, synchronizing the tasks, and feeding the input andoutput to all processors.

Despite the growing limitations associated with Moore’s law,computationally intensive artificial intelligence (AI) applications haveexploded in capabilities in the last few years, and it is necessary toexceed the computing limitations of traditional Von Neumann stylecentral processing units (CPU’s). New hardware developments have beenspecifically designed for artificial intelligence applications toaccelerate training and performance of neural networks and reduce powerconsumption. The traditional solution was to reduce the size of logicgates to fit more transistors. Shrinking logic gates below about 5nanometers (nm), however, may cause the chip to malfunction because ofquantum tunneling.

New artificial intelligence hardware includes processors that enablefaster processing of these AI applications with enhanced machinelearning, neural networks and computer vision. Some graphic processingunits (GPU’s) use massively parallel architecture with thousands ofsmaller, more efficient processing cores to handle multiple taskssimultaneously, instead of using a few cores optimized for sequentialserial processing as in the more conventional central processing unitsavailable on the market. Other techniques for increasing processcapabilities for AI include application-specific integrated circuits(ASIC), but these specialized hardware circuits suffer the drawback ofimplementing traditional Von Neumann architecture and floating pointoperations, even though there have been some improvements with a neuralnet architecture.

A field programmable gate array (FPGA), on the other hand, may enablegreater customization after manufacturing using a hardware descriptionlanguage, and may include the application of neural networks to analyzelarge amounts of data. The use of programmable circuitry in a FPGArather than customary software instructions enables complex neural netsto be configured and reconfigured seamlessly for deep data uses. TheseFPGA systems, however, have limited memory and slower clock rates.

Other possibilities to meet the increasing demands of artificialintelligence applications include quantum computers, which worksignificantly different than conventional computers. Instead ofemploying conventional “on” and “off” switches and bits depending on theelectrical state, quantum computers use qubits, in which an individualbit can be in one of three states, i.e., on, off, or uniquely both onand off simultaneously. Instructions do not load sequentially, but mayexecute simultaneously, thus increasing speed dramatically. Advances inquantum computing are limited and it is difficult to access many itemsin a database at the same time and analyze different images or datapoints until further advancements are made in this technology area.

Although some advanced computer systems increase processing speeddramatically, these computer systems do not mimic the human mind, andinstead use traditional floating point operations. Central processingunits operate in a sequential manner, and even the more advanced graphicprocessing units operate via massive parallel processing. It is stilllinear processing, but the human mind is highly non-linear. The humanbrain has many billions of neurons and may each have up to 10,000connections to other nerve cells, and externally and internally hosthundreds of thousands of coordinated parallel processes that aremediated by millions of protein and nucleic acid molecular interactions.The complexity of the human brain is staggering. Many millions ofneurons are employed at the same time with little power demand ascompared to electronic circuits. Some of the more advanced chips maymimic the brain’s architecture, but these use vastly greater amounts ofpower with a magnitude fewer computational connections. Even advancedneuromorphic chips that have recently been designed are limited in thenumber of artificial neurons that are used because of their designlimitations and manufacturing tolerances.

Some very long instruction word computer architectures take advantage ofinstruction level parallelism, where a fixed number of operations areformatted as one large instruction in a massively parallel architecture.The processors may reduce hardware complexity, and a compiler may createeach very long instruction word, but the design limitations associatedwith normal processors still applies.

SUMMARY OF THE INVENTION

This summary is provided to introduce a selection of concepts that arefurther described below in the Detailed Description. This summary is notintended to identify key or essential features of the claimed subjectmatter, nor is it intended to be used as an aid in limiting the scope ofthe claimed subject matter.

The coprocessor as disclosed provides enormously increased processingpower without partitioning the processing load and may be used forartificial intelligence applications, including artificial generalintelligence (AGI). The coprocessor expands the processing workload intovery long data words having a range of about one thousand to one millionor more bits, which are referred to as elastic representation VLDWs andare designed for knowledge representation in applications such asartificial intelligence and next generation databases.

The coprocessor may comprise a memory configured to store a plurality ofVery Long Data Words, each comprising a test Very Long Data Word (VLDW)having a length in the range of about one thousand bits to one millionor more bits and containing encoded information that is distributedacross the length of the VLDW. A processor may be configured to generatesearch terms. A processing logic unit may be configured to receive atest VLDW from the memory, receive a search term from the processor, andcompute a Boolean inner product between the search term and the testVLDW read from memory indicative of the measure of similarity betweenthe test VLDW and the search term. The processing logic unit may operateon successive test VLDWs compared against a search term. An externalsensor may be connected to the coprocessor and configured to generatesensor data, and the processor is configured to receive the sensor dataand generate a sensed data VLSW from the sensor data. The processinglogic unit is configured to compare the sensed data VLDW to a searchterm.

A buffer may be configured to store the Boolean inner products resultingfrom the computation between each search term and the test VLDW togetherwith the address in memory from which the test VLDW was read. Theprocessor may be configured to compare the Boolean inner products to athreshold and allow only those Boolean inner products that are greaterthan the threshold to pass to the buffer for storage therein. Theprocessor may be configured to periodically scan through the buffer todetermine match results among the Boolean inner products

A search term may comprise a VLDW, and in another example, a search termmay comprise a focused search term that is modified from an originalsearch term as a VLDW to express or extinguish features of interest. Theprocessing logic unit may comprise one or more pipelined Boolean logiccircuits that compute the Boolean inner product. The one or morepipelined Boolean logic circuits may comprise a plurality of Booleanadder circuits. The processing logic unit may comprise a plurality ofpipeline Boolean logic circuits configured in parallel to each other,and each pipelined Boolean logic circuit may be loaded with the sametest VLDW and a different search term.

In yet another example, a Direct Memory Access (DMA) controller may beconnected to the processor and memory and configured to address andcontrol the transfer of test VLDWs from memory to the processing logicunit. The processor may include a conventional CPU interface forcommunicating with external devices, wherein the processor is configuredto receive very long data words as a plurality of 64-bit words via theconventional CPU interface and reformat the 64-bit words into a testVLDW having a length of about one thousand bits to at least one millionbits. The processor may be configured to perform calculations at asingle clock rate and compute a Boolean inner product between eachsearch term and the test VLDW at a latency to obtain the results aftermultiple clocks. In yet another example, the processor includes a serialinterface and digital logic. The serial interface may pass serial datato the digital logic to reformat the serial data into very long datawords.

The processor may be configured to generate a plurality of test VLDWs,and the processing logic unit may include a processing buffer into whichthe plurality of test VLDWs are buffered. The processing logic unit maycomprise a plurality of pipeline Boolean logic circuits, and each havinga processing buffer into which a plurality of test VLDWs are buffered.

In yet another example, a coprocessor may comprise a processorconfigured to generate 1) a plurality of Very Long Data Words, eachcomprising a test Very Long Data Word (VLDW) having a length in therange of about one thousand bits to one million or more bits andcontaining encoded information that is distributed across the length ofthe VLDW, and 2) search terms. A processing logic unit may be coupled tothe processor and include a processing buffer and configured to receivea test VLDW from the processor and buffer the test VLDW within theprocessing buffer, receive a search term from the processor, and computea Boolean inner product between the search term and the test VLDWindicative of the measure of similarity between the test VLDW and thesearch term.

The processing logic unit may comprise a plurality of pipeline Booleanlogic circuits, each having a processing buffer into which a pluralityof test VLDWs are buffered. A memory may be configured to store aplurality of test VLDWs, and the processing logic unit is configured toreceive test VLDWs from the memory. A Direct Memory Access (DMA)controller may be connected to the processor and memory and configuredto address and control the transfer of test VLDWs from memory to theprocessing logic unit. The processing logic unit may comprise a firstplurality of pipeline Boolean logic circuits and each having aprocessing buffer into which a plurality of test VLDWs are buffered, anda second plurality of pipeline Boolean logic circuits and eachconfigured to receive a test VLDW from the memory. A storage buffer maybe configured to store the Boolean inner products resulting from thecomputation between each search term and the test VLDW together with theaddress from which the test VLDW was read.

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects, features and advantages of the present invention willbecome apparent from the Detailed Description of the invention whichfollows, when considered in light of the accompanying drawings in which:

FIG. 1 is a block diagram of the synaptic coprocessor showing basiccomponents in accordance with a non-limiting example.

FIG. 2 is a block diagram of the synaptic coprocessor of FIG. 1 showinga single processing pipeline as an example data transport amongcomponents.

FIG. 3 is another block diagram of the synaptic coprocessor showing anexample of the processing pipeline architecture.

FIG. 4 is another block diagram of the synaptic coprocessor showinggreater detail of registers and associated components.

FIG. 5 is another block diagram of the synaptic coprocessor showingmultiple processing pipelines.

FIG. 6 is another block diagram of the synaptic coprocessor showing aprocessing pipeline and data flow.

FIG. 7 is a block diagram of the synaptic coprocessor showing details ofthe attention processing unit of FIG. 3 that produces a focused searchterm.

FIG. 8 is another block diagram of the synaptic coprocessor showingdetails of the vector processing unit of FIG. 3 .

FIG. 9 is a block diagram of the synaptic coprocessor showing an exampleof various registers.

FIG. 10 is a block diagram showing the address and data distribution inthe synaptic coprocessor example of FIG. 9 .

FIG. 11 is a block diagram showing the data distribution logic in thesynaptic coprocessor of FIG. 9 .

FIG. 11A is a block diagram of an external sensor connected to thesynaptic coprocessor that generates data to the synaptic coprocessor forconversion into a very long data word.

FIG. 11B is another example of the synaptic coprocessor shown in FIG. 3, but showing a buffered processing pipeline.

FIG. 12 is a graph showing Carry-Ahead Adder (CAA) performance as afunction of word size and group size.

FIG. 13 is a schematic block diagram showing logic for a 16-bit wideadder with four-bit groups that can be used with the synapticcoprocessor as a non-limiting example.

FIGS. 14A-14C are schematic block diagrams showing logic for a 64-bitwide adder that can be used with the synaptic coprocessor as anon-limiting example.

FIG. 15 is a simplified example of elastic representation VLDWs that canbe used with the synaptic coprocessor as a non-limiting example.

FIG. 16 is a high-level block diagram showing generally how the elasticrepresentation VLDWs may be generated.

FIG. 17 is a high-level block diagram of another representation of thelogic used in the synaptic coprocessor.

FIG. 18A is a schematic diagram of an engram as an individual neuron foran example elastic representation VLDW used with the synapticcoprocessor.

FIG. 18B are example elastic representation VLDWs similar to that ofFIG. 18A to convey the size of an animal as used with the synapticcoprocessor.

DETAILED DESCRIPTION

Different embodiments will now be described more fully hereinafter withreference to the accompanying drawings, in which preferred embodimentsare shown. Many different forms can be set forth and describedembodiments should not be construed as limited to the embodiments setforth herein. Rather, these embodiments are provided so that thisdisclosure will be thorough and complete, and will fully convey thescope to those skilled in the art.

Referring now to FIG. 1 , there is illustrated at 100 the synapticcoprocessor that includes a processor 104 that communicates withexternal devices outside the synaptic coprocessor via a serial input andoutput port 108, and may receive and send interrupts and real-time clocksignals. The synaptic coprocessor 100 includes 64-bit address and databuses 112 that communicate with other devices outside the synapticcoprocessor. A local 64-bit memory 116 is included within the synapticcoprocessor 100 that stores conventional length data. DMA logic 120 thatmay include DMA controller functionality may be connected to theprocessor 104 and to a very long data word (VLDW) memory 124 in thisexample. In another aspect described below with reference to FIG. 11B, abuffer may be used. The DMA controller 120 is configured to address andcontrol the transfer of test very long data words (VLDWs) from the verylong data word memory 124 to the processing logic unit 128, whichincludes a plurality of processing pipelines 130 as illustrated byPipeline No. 1 to Pipeline No. N. The very long data word memory 124 isconfigured to store a plurality of very long data words, each formed asa test very long data word (VLDW) having a length in the range of about1,000 bits to 1 million or more bits and containing encoded informationthat is distributed across the length of the VLDW. The number and rangeof bits may vary. The VLDW memory 124 may include RAM. The very longdata words are also referred to as Elastic Representation VLDWs anddesigned for knowledge representation in applications such as artificialintelligence and next generation databases as explained in greaterdetail below. The illustrated processor 104, processing logic unit 128,and memory 124 may contain registers for holding data, such asconventional length, e.g., 64-bit words, or very long data words asdescribed above.

The processor 104 is configured to generate search terms and theprocessing logic unit 128 is configured to receive a test VLDW from theVLDW memory 124 and receive a search term that had been generated fromthe processor and compute a Boolean inner product between the searchterm and the test VLDW read from memory 124 indicative of the measure ofsimilarity between the test VLDW and at least one search term (FIG. 2 ).The very long data word may include encoded information that isdistributed across the length of the very long data word as apseudorandom number, and in an example, include a globally random andlocally ordered linear array of data. The search term may be a VLDW, andin an example, by processing in an attention processing unit 134 (FIG. 3), and which is part of the processing logic unit 128, may be convertedinto a focused search term that is modified from an original search termas a VLDW to express or extinguish features of interest as explainedbelow. In an example, the processing logic unit 128 may be configured tooperate on successive test VLDWs compared against a search term. In anexample, the attention processing unit 134 is operative with a vectorprocessing unit 138, which includes a scoring logic unit 142.

As shown in FIG. 2 , the pipeline preload of data from the processor 104may include search terms and focus terms. The focus terms, i.e., theattention data word (FIG. 3 ), modifies the search term as the targetdata word to form the focused search term as the focused target word.The processor 104 generates address control instructions to the DMAcontroller 120 in this example, which controls memory addressinginstructions over the address bus to the VLDW memory 124, such as anaddress range of a certain test VLDW. The measure of similarity of thesearch term and the very long data word at the processing logic unit 128is a count of the number of positions in which both the search term,which may be a very long data word, and the test very long data wordread from the VLDW memory 124 have a “one.” The count may be implementedby an adder tree organized as a processing pipeline 130 that can beclocked at the DMA rate. Due to the encoding that is used for the VLDWand the search term, there is no ripple, carry, or look ahead logicrequired, except with the adder tree in this example, which may beorganized in stages to support the DMA clock rate. The data fields in avery long data word may be one bit wide in an example.

The processing logic unit 128 includes N processing pipelines 130, eachformed as a pipeline Boolean logic circuit and corresponding to theillustrated Pipeline No. 1 to Pipeline No. N (FIG. 1 ) that compute theBoolean inner product. In one example, at least one of the pipelineBoolean logic circuits includes a Boolean adder circuit, and in anotherexample, a plurality of pipeline Boolean logic circuits 130 may beconfigured in parallel to each other and each pipeline Boolean logiccircuit loaded with the same test very long data word, but a differentsearch term.

The synaptic coprocessor 100 via its processing logic unit 128 may beconfigured to perform calculations at a clock rate and compute a Booleaninner product between the search term and the test VLDW at a latency toobtain the results after multiple clocks. The conventional CPU interfaceas part of the 64-bit address and data buses 112 may communicate withexternal devices and the processor 104 and may be configured to receivea plurality of 64-bit words via the conventional CPU interface 112 andreformat the 64-bit words into a test VLDW having a length of anywherefrom more than 1000 bits to 1 million or more bits. Although 64-bitwords may be standard in some instances, other conventional length bitdata words may be received and that data reformatted into a very longdata word. The processor 104 may also include a serial interface 108 andassociated digital logic. The serial interface 108 as part of aconventional CPU interface may pass serial data to the digital logic aspart of the processor 104 to reformat the serial data into very longdata words.

During processing, the result as sum logic from the processing logicunit 128 with its Boolean logic is a measure of the similarity of thesearch term and the test VLDW and may be buffered (FIG. 4 ) in afirst-in first-out (FIFO) buffer 150, allowing access to the results ofthe processing. The buffer 150 may include buffering logic havingcontrols that can be used to reduce the number of results that a centralprocessing unit as part of the processor 104 may read.

As noted before, the processing logic unit 128 may include an attentionprocessing unit 134, also referred to as attention logic, that providesthe ability to modify the “search” term as an example very long dataword (VLDW) to express only features or bits of interest or to excludefeatures that are not of interest and produce a focused search term thatis then processed in another section of the processing logic unit as thevector processing unit 138 that includes the scoring logic 142 (FIG. 3 )and operating on example one-dimensional arrays, such as a VLDW.

The search term and focus term may be preloaded within a pipelinedpreload circuit and the DMA controller 120 may begin to rapidly cyclethrough the very long data words stored in memory 124. A singleprocessing pipeline 130 may include the attention logic circuit, such asthe illustrated attention processing unit 134, and additionallyprocessing logic, such as the vector processing unit 138, and a buffercircuit with associated logic that may include the FIFO buffer 150 shownin FIG. 4 , which may operate as a storage buffer. Multiple processingpipelines 130 may operate in parallel where each processing pipeline maybe preloaded with different search terms or focus terms and allpipelines may process the same very long data words from memory 124.

The processor 104 may also generate control signals to select a mode,such as in selecting the Boolean logic operation that may include AND,OR, EXOR, NAND, Left Circular Shift, or Right Circular Shift. Thefocused search term that results from the attention processing unit 134may be vector processed 138 and the address and result sent back to theprocessor 104, including the Boolean inner product, which in thisexample had been buffered, along with the memory address from which theVLDW was retrieved from the VLDW memory 124. The processor 104 mayperiodically scan through the buffer 150 and inspect the matchedresults. To reduce the load on the processor 104, the Boolean innerproducts may be compared to a threshold, and those Boolean innerproducts that are greater than the threshold may pass to the buffer 150for storage therein. The processor 104 may inform the DMA controller 120as to the start and end address for blocks of VLDW memory 124 to besearched. The processor 104 is free to perform other functions while theDMA controller 120 drives the search and addresses operations for thedifferent terms. At the conclusion of that processing function, theprocessor 104 may inspect the storage buffer 150 looking for the resultsof interest.

Using the conventional processor interface 112, in an example, a 64kilobit very long data word may be processed at the processor 104 viadata received over the standard conventional processor interface as1,024 64 (sixty-four) bit words, and stored in the VLDW memory 124 asone 64 kilobit word. Within the synaptic coprocessor 100, the very longdata words may be transported between VLDW memory 124 and the processinglogic unit 128 as single very long data words with massively parallelprocessing as shown by the plurality of processing pipelines 130 (FIGS.1 and 5 ), with control signals generated from the processor 104 to thevarious processing pipelines (FIG. 5 ) and the address and resultsbuffered within the buffer 150 (not shown in FIG. 5 ) and sent back tothe processor 104. Very long data words may be loaded into the differentprocessing pipelines 130 of the processing logic unit 128 as searchterms or focus terms (FIG. 2 ) and very long data words loaded from VLDWmemory 124. It may be possible to load a series of 64-bit words from theprocessor 104 and the local 64-bit memory 116 (FIG. 1 ) and organized as64-bit words. The synaptic coprocessor 100 has this adaptability forprocessing and generating those words having different word lengths.

As shown in FIG. 3 , the processing logic unit 128 includes theattention processing unit 134 that receives a focus term, also referredto as an attention data word, and may be loaded with a very long dataword that allows any associated search term (target data word) to bemodified to express or extinguish specific features of interest. Thevarious functions of AND, NAND, OR, NOR, EXOR, Left Circular Shift, andRight Circular Shift may apply individually or collectively to each ofthe processing pipelines 130. Each processing pipeline 130 may be loadedwith a different focus term and loaded with a different search term.Each processing pipeline 130 may be set to perform a different Booleanoperation via a control signal generated to a specific processingpipeline 130 from the processor 104 to select the mode for the Booleancalculation as shown in FIG. 3 .

Referring again to FIG. 4 , there is shown a data multiplexer 154coupled to the processor 104 that receives a search term and/or focusterm in this example and multiplexes that data corresponding to the termand stores the term in a register 156 and labeled Register A. A verylong data word is received from the VLDW memory 124 and into anotherregister 160 labeled Register B and the Boolean logic circuit 164receives and logically operates on the data as words from both RegistersA and B, and outputs the result as the Boolean inner productcorresponding to the sum logic 164, which is sent as the processingresults to the FIFO buffer 150 as the storage buffer. The DMA controller120 in this example controls the movement of the very long data wordsfrom the VLDW memory 124 to the different processing pipelines 130 underthe governance of a computer software program operating in the processor104, and the data moves between the VLDW memory 124 and the processinglogic unit 128 as very long data words that can range in length fromabout 1,000 bits to at least one million bits. The data multiplexer 154may also reformat words from more conventional data words, such as64-bit data words as received from the processor 104 into a very longdata word format. Thus, a very long data word as a focused search term(FIG. 3 ) may operate via Boolean operations as in the vector processingunit 138 and as part of the processing logic unit 128 to produce to thesum logic 164 (FIG. 4 ).

Referring now to FIG. 5 , there are illustrated multiple processingpipelines 130 in which a search term, a focus term and control signalare each generated from the processor 104 and received within each ofthe processing pipelines 130. The terms may be the same or different.The generated control signal received into each of processing pipelines130 may be individually selected for a specific Boolean operation ineach of the respective processing pipelines 130. The address of the testVLDW from the VLDW memory in this example and result as the Booleaninner product from each processing pipelines 130 may be sent back to theprocessor 104 for further logical processing and/or comparison in anexample. The processor 104 may output a control signal for an addressrange to the DMA controller 120, which operates with the processingpipelines 130 and the VLDW memory 124 to select via a memory address thetest VLDW and input as selected test very long data words. The processor104 may generate target data words as search terms and attention datawords as focus terms for the respective target word (search term) (FIG.6 ).

The target data word as the target word or search term may be generatedfrom the processor 104 and sent to the Target Word (search term)Register 170 and the attention data word as a focus term generated bythe processor 104 and sent to the Attention Word (focus term) Register174. Boolean logic 178 operates on data contained in both the TargetWord Register 170 and Attention Word Register 174 and outputs to Booleanlogic circuit 164, which receives the test VLDW from the test VLDWregister 180 and outputs a bit vector to sum logic 164.

A control signal may also be generated from the processor 104 to selecta mode in the sum logic 164 where the Boolean inner product may beexpressed, and together with other Boolean inner products as ahistogram, representing a probability distribution. For example, therecould be a number of 64-bit words, and certain “hits” may be scatteredto the low end and high end, and it is possible to obtain a probabilitydistribution as in 64 bins. Each bin may be the sum of the number ofhits in 1,000 bits, and the synaptic processor 100 obtains a 64 pointapproximation to the distribution. This is a helpful way to determine ifa first answer A is better than a second answer B. One aspect is if thenodes for all characteristics are randomly distributed across the entirerange, it may be more difficult to read into the correlation of low endversus the high end. For this reason, the data may be arrangedpseudo-randomly, and in an example, with globally random and locallyordered arrays, where the distribution of data is not fully randomized.

Referring now to FIG. 7 , there is illustrated the attention logic aspart of the attention processing unit 134 of FIG. 3 and illustrating theTarget Word Register 170 for the search term, and the Attention WordRegister 174 for the focus term and the Boolean logic circuit 178 thatoutputs the focused target word, i.e., focused search term. Similarcomponents are shown in FIG. 6 . This attention processing unit 134allows the search term to be stripped down to a lower weight vector forprocessing in the vector processing unit 138 (FIG. 3 ). This lowerweight vector reflects content of interest or content to exclude. Thelogic circuit as the attention processing unit 134 may employ a verylong arithmetic logic unit (ALU) and Boolean logic 178 to combine thetarget word as the search term and the focus term as the attention wordin many possible ways to strip the vector down to what is of interest orwhat should be excluded. The Boolean logic 178 at the attentionprocessing unit 134 is placed into one of several possible Booleanoperational modes by the processor, e.g., AND, NAND, OR, NOR, EXOR, LeftCircular Shift, and Right Circular Shift.

Referring now to FIG. 8 , there are illustrated further details of anexample of the vector processing unit 138 of FIG. 3 and showing thescoring logic functions that occur within the vector processing unit aspart of the processing logic unit 128. The focused search term may bereceived from the attention processing unit 134 and the Boolean logiccircuit 164 outputs a bit vector to the summation logic circuit 164 andthe resulting “score” or histogram is stored in this example within theFIFO buffer 150. The VLDW memory 124 may store thousands or millions ofthe very long data words and the results of the summation may be loadedor updated to the processor 104. Operations may be controlled at a highlevel. The Boolean logic circuit 164 may perform a dot productcalculation in some examples.

In the synaptic coprocessor 100 as described, floating pointmultiplications are not computed, and instead, the very long data words(VLDWs) are processed via the processing logic unit 128 to perform bitoperations on multiple registers, such as D_(I)= (A_(I) AND B_(I)) AND(NOT C_(I)) , where A, B, C, and D are 64 kilobit registers and Iindicates the bit number ranging from 1 to 2 ¹⁶. The processor 104 in anexample may perform bitwise operations such as Bit Set/Bit Get and BitShifts as single word operations with AND, OR, EXOR, NOR, NAND, LeftCircular Shift, and Right Circular Shift and complement with Bit LevelMasks. The Bit Level Dot Product may require greater than one clockcycle to complete. Multiple coprocessors may be implemented within achip to increase throughput.

Referring now to FIG. 9 , there is illustrated a schematic block diagramof an example of another segment of the synaptic coprocessor 100architecture that illustrates different data registers and associatedbitwise binary operations. An example processor data register (A1) 200receives data from the processor 104, such as the search term or focusterm. The main store data register (D1) 204 may receive the test VLDW.Mask registers (B1) 208 and (C1) 212 receive data from the processor104. Control signals are input with memory mapped from the processor 104address and data from the registers 200,204 input to the logic circuitsfor bitwise binary operations X and Y (220,224) with those circuits alsoreceiving data from mask registers 208,212, which receives input fromfurther bitwise binary operations Z 226, which in turn, receives outputfrom the bitwise binary operations X and Y (220,224). The adder tree 230is shown. Data is delivered over the data bus and includes input fromthe control logic circuit W 238.

A masking function may not be required when there are no constraints. Ina real-life example, a main storage such as memory 124 may holdsignatures for investment types as an example to be searched forsemiconductor stocks with at least 15% growth for the last two years. Inthis example, there are no constraints. The processor 104 loads thesignature for a semiconductor stock with a 15% growth over the last twoyears into Register A1 (200) and commands a search over all investmentsignatures in the memory 124. The DMA logic as the DMA controller 120 inthis example loads each investment signature one at a time, at the fullclock speed of the processor 104. With each clock in this example, theloaded signature is bit-wise ANDed with the signature and a number ofresulting ones in a register Z (not shown) as part of the bitwise binaryoperation Z 226 is counted by the adder tree 230. Each result in theregister Z that exceeds the threshold is programmed into control logic W238 and sent to the processor 104 via the data bus 234.

With constraints, a selected region of the main VLDW storage 124 is tobe searched to determine the response or distance of each signature froma target signature subject to possible constraints. For example, allinvestment types except junk bonds may be searched, or alternatively,only selected aspects of an input target may be searched, such as onlymutual funds. The match of selected aspects of the signatures frommemory 124 may be scored and is processed one very long data word at atime over a selected range of addresses. The results may be copied intoanother region of memory 124 or the score may be copied to the mainmemory or sent to the processor 104 with source addresses.

As an example set-up, the processor 104 may load into Register A1 200the aggregate signature of the desired results and load into maskregister B1 208 any constraints about which bits to include or excludefrom the signature in Register A1. The processor 104 may load a controlfield that controls the operation of a register X associated withbitwise binary operations X 220, according to whether the “masking”operation is one of inclusion or exclusion or other binary operations.The processor 104 may load into mask register C1 (212) any constraintsabout signatures to be tested, in terms of bits to include or exclude.The processor 104 may load the control field that controls the operationof a register Y associated with bitwise binary operations Y 224,according to whether the “masking” operation is one of inclusion,exclusion, or other binary operations. The processor 104 may load intothe DMA controller 120 the first and last address of the region of thestorage for the VLDW memory 124 to be searched and load the controlfield that controls the operation of the register Z and associatedbitwise binary operations Z 226 and the type of binary operation to beperformed on the inputs from registers X and Y associated withrespective bitwise binary operations X and Y 220,224. This may be abit-for-bit AND as the Boolean operation between X and Y. The processor104 may load a control field that determines whether the final result isto be taken from register Z associated with the bitwise binaryoperations Z 226 or the adder tree 230, or to move the result, andwhether to threshold test the result.

Referring now to FIG. 10 , there is illustrated another schematic blockdiagram of logic and showing an address distribution logic 250 betweenthe processor 104 and RAM 254 that may include very long data word RAMstorage. The data distribution logic 258 may include multiplexing anddemultiplexing functions and is coupled to the register logic 262, whichmay include a notation of that data distribution and number of bits in adata word. The register logic 262 may operate on very long data wordssuch as from a smaller 256 bits wide to 256 megabits wide and theregister logic may perform bit-wise operations on all bits, including avery long data word on each cycle of the clock, such as a 1 GHz clock.The very long very data words may be stored in the VLDW memory 124 thatmay include RAM 254, and data words transferred to and from the registerlogic 262 on each clock cycle. This allows high input and output andprocessing rates that are roughly 1 GHz, assuming one N bit for verylong data words as equal to one bit of binary operations per second. Theregister logic 262 and RAM 254 may be able to communicate with a hostCPU using CPU words as 64-bit words, for example.

At the address distribution logic circuit 250, the clock may incrementaddress and least significant bits (LSBs) when data is transferred fromthe processor 104 to an address buffer as part of address distributionlogic 250 or from an address buffer to the processor 104. On the RAM 254side as memory storage such as for the address may have the leastsignificant bit set and the transfer to and from the memory may occur ona single clock cycle. The address distribution logic circuit 258 mayinclude an address buffer and address least significant bit controls asan up-and-down counter and any control logic and a clock input.

Referring now to FIG. 11 , a schematic block diagram of example detailsof the data distribution logic 268 is illustrated, and showing the clockdistribution control circuit 270 and the processor 104 that couples to aselection logic circuit 274 and a data buffer 282 operative with thememory as RAM 254 in this example and receiving input from a pluralityof latches 286, e.g., flip flops, all operative in this example with theselection logic circuit 274. The data input to the processor 104 from anexternal device (FIG. 10 ) may be a standard data word such as a 64-bitdata word and the data input to the data distribution logic 268 from thememory may be a data word, e.g., N times 256. For example, the processordata interface may be 64 bits and the RAM interface may be 256 bitswhere M=4 and the processor data interface may be 64 bits and the RAMinterface may be 256 K bits with M=1024 as a non-limiting example.

Referring now to FIG. 11A, there is illustrated an external sensor 300,such as a camera, that is configured to generate output sensor data. Theexternal sensor 300 may include a processor (not shown) that configuresthe output sensor data as smaller bit words that are combined by thecognitive coprocessor 100 into a very long data word. The processor 104is configured to receive the sensor data and generate a very long dataword corresponding to output sensor data. The processing logic unit 128is configured to compare this converted very long data word to a searchterm. The external sensor 300 may be a camera having object recognitionsoftware that provides an input as data to describe what the sensor hasdetected in the very long data word formats, such as an elasticrepresentation VLDW. That output sensor data is converted into a verylong data word and may be compared to search terms and focus termsloaded into any processing pipelines 130. In the alternative, the sensordata may be loaded into the processing pipelines 130 as a search termand compared against very long data words from memory 124, looking forstrong matches and various attention settings.

As noted before, the processor 104 may be configured to performcalculations at a single clock rate and compute a Boolean inner productbetween the search term and the test VLDW at a latency to obtain theresults after multiple clocks. The bitwise ANDing may be started in oneclock, but the process may require complex adder trees, and thecalculations may be accomplished in a processing pipeline 130, e.g., anew ADD can be started every clock. Carry-Ahead Adders (CAA) may beused, which may include Log 2(n) gate delays where “n” is the word sizein bits.

Referring now to FIG. 11B, there is illustrated another embodiment ofthe synaptic coprocessor 100 and showing a VLDW buffer 320 as part ofthe vector processing unit 138 and optional DMA controller 120 and VLDWmemory 124 shown in dashed format. A buffer 320 in the processingpipeline 130 compares input data as a target data word against entriesin its local or internal memory buffer 320 instead of using the DMAcontroller 120 to compare against a block of entries in the main VLDWmemory 124. This configuration may use less hardware since it is notnecessary to employ the DMA controller 120 and VLDW memory 124. Thestandard processing pipeline 130 using the DMA controller 120 and VLDWmemory 124 compares each input VLDW against all the VLDWs within somesegment of memory, using DMA logic to scan through the memory. Theprocessing pipeline 130 that incorporates the VLDW buffer 320 as part ofthe vector processing unit 138 does not compare and input a VLDW againsta VLDW in memory 124, and instead, the host CPU as the processor 104loads one or more VLDWs into the VLDW buffer 320 as part of a processingpipeline 130, and the processing pipeline compares each input VLDWagainst the VLDWs stored in the local VLDW buffer 320.

The same focus logic as described may be employed and a change is thatthe source of VLDWs to be compared moves from the main VLDW memory 124to the local VLDW buffer 320. Each processing pipeline 130 may havedifferent VLDWs stored into its local VLDW buffer 320. It is possible touse a mix of “standard” and “buffered” processing pipelines 130. Forexample, there may be two standard (using DMA control) and six buffered(using buffer 320) processing pipelines and another design may use fourprocessing pipelines 130 that are incorporating data from the VLDWmemory 124 and the DMA controller 120 and four processing pipelines 130may include the VLDW buffer 320. Thus, the processing logic unit 128 inthis example may include a processing buffer, i.e., a VLDW buffer 320,within a subset of processing pipelines 130 and the processing logicunit 128 may receive a search term from the processor 104 and compute aBoolean inner product between the search term and the test VLDWindicative of the measure of similarity between the test VLDW and thesearch term. The processing logic unit 128 may include a plurality ofprocessing pipelines 130 as Boolean logic circuits, each having aprocessing buffer 320 into which a plurality of test VLDWs are buffered.This structure and function as explained with reference to FIG. 11B maywork with the optional VLDW memory 124 and DMA controller 120 asillustrated. Thus, the processing logic unit 128 may include a firstplurality of processing pipelines 130 as Boolean logic circuits eachhaving a processing buffer 320 into which a plurality of test VLDWs arebuffered, and a second plurality of processing pipelines 130 as Booleanlogic circuits and each configured to receive a test VLDW from the VLDWmemory 124.

FIG. 12 shows a graph having a plot for the delay of a CAA (Carry-AheadAdder) as a function of word size with word or group sizes ranging from4 to 4,096 bits and the lines referenced with letters A to F. Fromsimulation results, it is evident that it is possible to obtain the “Log2N” gate delays, which is shown in the lower curve of FIG. 12 labeled“A.” The graph of FIG. 12 indicates that by using larger groups, thissynaptic coprocessor 100 may be approximated with less nominal delay. Itis possible that multiple levels of groups may be helpful. Much wideradders may be used, and the simulation shows the positive results.

Referring to FIG. 13 , there is illustrated a block diagram of anexample from a simulation of the logic for a 16-bit wide adderillustrated at 400 with four-bit groups as part of the carry look aheadadder segment 404. This block diagram shows the sum and the time takento generate all sum bits as (10.2) units and the delay to calculate thesum in each bit position.

Referring now to FIGS. 14A-14C, there is illustrated an expandedschematic block diagram of the logic circuit 450 for 64-bits processingand a carry look ahead adder. The schematic block diagram would becomegreatly more complicated for 1,024 bits, and for 1 million bits socomplicated it would not be producible on even large sheets of paper,the increasing complexity would make reproduction as a schematic blockdiagram impossible with many square feet of paper in order to bereadable.

An adder tree may impose a processing latency of log2 clocks, such thatfor a 1 million bit word, there would be a 20 clock latency. The firststages in an example may have small values to be added, but may notrequire carry look ahead logic. At the bottom of the adder tree, theremay be some benefit to using carry look ahead logic. The synapticcoprocessor 100 may use a relatively slow clock rate, somewhere between500 MHz and 1 GHz, because read access memory (RAM) is much slower thancomputational logic. The slow clock rate may make it easier for an addertree to keep up. Thus, it is possible that a 4 GHz clock for the addertree logic as carry look ahead logic may be used, but new inner productsmay be computed at a 1 GHz rate or lower.

There now follows a description of the very long data words alsoreferred to as elastic representation VLDWs that represent a knowledgerepresentation placed into binary form. It should be understood that thesynaptic coprocessor 100 may operate with a systematic system thatrepresents knowledge within an artificial neural network (ANN) andincludes a large number, e.g., many thousands of “nodes,” where eachnode may be assumed to approximate the behavior of a biological neuron.Meaning may be ascribed to a set of nodes and not to one singleindividual node, and thus, a set of nodes that means “dog,” for example,may approximate the idea of a memory “engram” in a brain. Any piece ofinformation may be referred to in general terms as a “concept” and everyconcept may be represented by a set of nodes in the ANN.

An example of the elastic representation VLDW for dog 500, wolf 504, andrat 508 are shown in FIG. 15 , showing the basic categories and data ina schematic diagram that may be encoded as a single illustration into avery long data word. There are a near-infinite number of acceptableelastic representation VLDWs. These simple schematic drawings of theseelastic representation VLDWs show that overlapping data may correspondto the matching of “ones” when the Boolean inner product is computed.This type of data representation indicates that optical processors andassociated optical computing may be used. A laser may quickly determinematches and overlaps.

One aspect is that similar concepts have a similar representation, e.g.,two ideas may be encoded by a similar set of nodes. For example, amotorcycle may be compared to a car and in many ways, they are similar.They both convey passengers and have roughly the same size and cost,both travel on roads and have other similar attributes and details. Theyare different, however, in the number of passengers the vehicles carryand the ability to travel off-road and the ability to travel ininclement weather, and thus, the two vehicles have some similarrepresentations and other representations not similar.

A possibility is to encode cars and motorcycles with these dataattributes to the extent relevant to the mission. For example, therepresentation for a car may encode its size, weight, MPG, range, safetyinformation, and similar details. Similar encoding may be used formotorcycles. A representation for each may be the aggregation of therepresentations for each property. Thus, the representation for cars andmotorcycles includes these similarities and differences and the degreeof similarity and difference.

The very long data word as an elastic representation VLDW may be encodedat the desired level of detail since it contains thousands of bits, upto about at least a million bits. The attention logic unit 134 withinthe synaptic coprocessor 100 may focus on the most relevant attributesfor a given situation. If the weather is fine, does it not matter thatmotorcycles are unsafe in bad weather?

The very long data words as noted above are referred to as elasticrepresentation VLDWs in one non-limiting example. Most computer codeviews data in black versus white terms, and for this reason, software isoften unreliable and characterized as “brittle.” The synapticcoprocessor 100 processes data such that data may be compared a matterof degree. A dog is somewhat like an elephant when compared to a shark.The representations are described as “elastic” because the synapticcoprocessor 100 code is not brittle, and this property endows therepresentations with an innate ability to generalize, which is widelybelieved to be a foundational capability for artificial generalintelligence. The prototypes as developed show that elasticrepresentation VLDWs do, in fact, possess a remarkable degree ofgeneralization, without sacrificing precision. The importance of this isthat elastic representation VLDWs as very long data words and provide atechnique to construct an associative memory that may retrieve storeddata based on the degree of conceptual similarity between some input andthe stored knowledge. An associative memory may be a starting point forbuilding a system with artificial general intelligence.

The synaptic coprocessor 100 also addresses the role of a concept. Asystematic approach to “roles” is part of the science of knowledgerepresentation, and the synaptic processor 100 may readily test for anelectric representation in various roles because the transformations ofthe representations in the different roles is compatible with thesynaptic processor instruction set. For example, the association of“John loves Mary” is very different from the association of “Mary lovesJohn.” In both cases, the meaning is lost unless the representation canconvey whether John is the subject or object in the sentence. Thesynaptic coprocessor 100 establishes a systematic approach to “promote”a representation of one of many possible roles and relationships, bymeans of prescribed mathematical transformations. This technique can bedirectly extended to cover more complex cases, such as “John, who isvery tall, loves Mary despite her being much shorter than John.”

There are elastic controls as flow constructor VLDWs. An intelligentsystem may not usually be built based only on data. Sophisticatedsystems require a significant body of control functions as part of thesystem design. Conventional systems often treat control as totallyseparate from the data and divide the design into a “control plane”versus a “data plane.” The synaptic coprocessor 100 may blur the linebetween data and control, so that the synaptic coprocessor may embedcontrol functions within an associative memory. Many of the advantagesrealized for data may be applied to control functions.

Control functions may be implemented with flow constructor VLDWs, whichmay be encoded as elastic representation VLDWs but also specify actionsto be performed. Some of the actions may be calls to hardware drivers orcommands to the synaptic coprocessor to change course or speed. Otherflow constructor VLDWs may adjust the operating parameters of anassociative memory by adjusting thresholds or maximum queue lengths orperhaps by modifying the parameter settings that control “breadth versusdepth of search” in accordance with the urgency, risk and reward of thecurrent situation. Still other flow constructor VLDWs can adjust theparameters that control what and how the system learns based onexperience.

Referring now to FIG. 16 , there is illustrated a block diagram at 600showing how elastic representation VLDWs may be generated, and showing acognition construction and visualization framework 604 operative withassociative memory 608 that includes a knowledge scaffold 612. Manyfactors enter into the design of the elastic representation VLDWs andinput to the cognition construction and visualization framework 604 suchas:

Precision - Some concepts such as digits or financial data may berepresented with good precision. A rough approximation is fine for manyconcepts but certainly not all.

Range - The concept of the size of a horse can often be dealt with by arough approximation, but dogs on the other hand have sizes that span therange from Chihuahua to Newfoundland. Sometimes elastic representationVLDWs may encode the expected range of a concept.

Degree of Generalization - elastic representation VLDWs may be designedto permit the underlying concepts to be generalized beyond their normalbounds, but overgeneralization can flood the processing with matches orinferences which are too weakly related to be of any value. Too littlegeneralization may limit the apparent intelligence of an AGI system.

Dimensions - A system typically encodes concepts in 1D or 2Drepresentations. There may be future applications which will requirethree or more dimensions.

Flow connectors and flow constructor VLDWs - These building blockscontrol the processing flow, establish processing parameters, includinglearning functions, and may activate hardware drivers.

A tool may be implemented such as the Cognitive Construction andVisualization Framework 604 to simplify the complex job of building anAGI-like system using elastic representation VLDWs.

Large-scale applications of the system may require vast processingpower; fortunately the very nature of elastic representation VLDWs opensthe door to processing architectures which can process data atastonishing speeds in terms of operations/second and not clock speed.

The synaptic coprocessor 100 may process elastic representation VLDWs atvery high speeds and make it possible to implement large-scale AGIsystems that process the data in real time. A general functional view ofthe synaptic coprocessor chip design is shown in FIG. 17 at 650. Theplurality of processing pipelines that operate in parallel are notillustrated, allowing the input data to be processed simultaneously withmultiple, different Attention Logic settings. Thus, for example, eachinput from a sensor, such as an image of a specimen that is a candidateto be collected, could be processed and encoded 654 to infersimultaneously its potential mission value. This could include its risksto the mission. Data could be obtained by collecting specimens from theocean floor that may contain high concentrations of methane and maypresent a significant risk to a drilling platform, for example, and theability to physically collect the specimen based on size, fragility, andsimilar factors may be advantageous. Attention logic 656 is coupled withthe associative memory 608 and similarly computation circuit 660 and amission supervisor 664.

It is possible that the synaptic coprocessor as a chip manufactured withcurrent, conventional semiconductor technology may surpass 10¹⁵operations/second. This chip may perform this large amount of processingfor two reasons: (1) the computations are simple because they mimic thesimple operations of a synapse, as opposed, for example, to the muchmore intensive computational operations of multiplying 64-bit floatingbit numbers; and (2) the nature of elastic representation VLDWsfacilitates processing large amounts of data in parallel. Currentprocessing chips, such as more conventional CPU’s, often have manycomputing cores to increase the processing power, but partitioning thedata and algorithms across those cores is difficult, and for somealgorithms, impossible. The challenge of partitioning the processingload across a multitude of processing elements is a barrier to achievingever higher processing throughput. This is not the case with elasticrepresentation VLDWs, which provide a more simple solution to thechallenge of massively parallel processing. Possible applicationsrelevant to the synaptic coprocessor 100 include:

Robotics - Equipping industrial and humanoid robots with AGI willgreatly extend the range of services they can offer.

Autonomous vehicles - From underwater to space-borne, on land, in theair, and on the sea surface, autonomous vehicles are expanding at abreathtaking pace. Today’s AI technology offers unreliable control ofthese vehicles and there are often too many vehicles (“swarms” ofautonomous vehicles) for humans to control. AGI is often the only viablesolution.

Knowledge Assistants - Expert technical assistance for almost anytechnical discipline from medicine to finance and science toconstruction.

Interactive Toys - Imagine today’s toys with sensors that are augmentedto provide a warm and fun response to interactions with children.

Cybersecurity - AGI offers a robust approach to rapidly assessing theintent and appropriate remediation in the presence of a flood oflow-level indications and warnings.

Anomaly Detection - Financial fraud in an audit, corporate, or bankingenvironment, insurance fraud, manufacturing defects, and code defects.

Cognitive Warfare - Everything from cognitive radios to cognitiveelectronic warfare (EW), cognitive sensors to cognitive battlefieldmanagement.

The synaptic coprocessor 100 may use the very long data words in sparsematrix techniques similar to a super position of smaller words and somedata representations. In an example, each bit may be conceptionallyanalogous to a synapse of a human brain. The synaptic coprocessor 100 asa chip may interface with the outside world as if it includes a 64-bitinterface, but internally operate with very long data words. Becauseprocessing pipelines 130 are used, the calculations may be accomplishedin a clock cycle, and in an example, compute a Boolean inner productbetween a search term and test VLDW at a latency to obtain the resultsafter multiple clocks. It can be clocked at a DMA rate and it is notbased on floating point or those types of standard computerrepresentations.

In the very long data words used with the synaptic coprocessor 100, thebits have values of one and the coprocessor may use a type of unaryarithmetic. This is in comparison to more conventional graphicprocessing units that use massive amounts of parallel floating pointoperations. The synaptic coprocessor 100 may operate similar todot-product engines, resulting in a measure of similarity of two words,such as two very long data words. It is possible to obtain a thresholdinstead of the synaptic coprocessor 100 inspecting every dot-productresult. There may be a million word buffer, but if only 8 “hits” areabove the threshold, those may be processed further with the address towhich each of those 8 “hits” came, indicating the match for that vector.

In an example, each processing pipeline 130 includes an adder treeorganized as a processing pipeline such as with a series of summers. Thesynaptic coprocessor 100 may be packaged as a single chip in thisexample because of the challenges associated with breaking up and thefan out of the very long data words. The adder tree may include logicthat replicates thousands of parallel lines for a very long data word,such as a 1 million bit wide word. The 1′s and 0′s may be in a linearmatrix for vector processing such as in vector processing unit 138. Ahistogram could be the result of a partial summation and taking theadder tree and tapping it off at a higher level. It may be possible tohave some ordering of the summation results on chip or off chip and haveserial Boolean and logic circuits.

There now follows a description of techniques for generating the elasticrepresentation VLDWs that represent very long data words. In a concept,it may be understood that “neurons that fire together are wiredtogether.” Each idea or mental concept is represented by a smallpopulation of neurons that wire themselves together via new synapticconnections into a set of neurons that behave as a “locked” set. Whenmost of the neurons become active, the entire set becomes active. Thisstrategy in the past has been called “voting” logic, where all neuronsin the set vote the same way. The concept of an engram is represented bythe entire set. An individual neuron may convey no meaning. Each neuronin the set may also be a member of thousands of other sets. With 1million neurons simulated, and 32 neurons in the set that includes onemental concept, the number of unique concepts that can be represented isnearly infinite. For example, 1,000,000³² is equal to about 10¹⁹² uniquecombinations. To be useful, there should be some separation betweenconcepts corresponding to the Hamming distance, so the number of usefulrepresentations may be less than 10¹⁹². Calculating the number of usefulrepresentations may depend on design parameters.

For example, referring to the dot design of 680 showing a linear arrayof nodes in FIG. 18A, the filled-in nodes may represent the neurons thathave been wired together to represent a mental concept as an engram.There may be different layers of elastic representation VLDWs.

Referring now to FIG. 18B and the parallel linear array of nodes at 684,there are examples of elastic representation VLDWs to convey the size ofan animal. The examples use 6 out of 32 nodes in the representation, buta more realistic example may use 32 out of 64,000 nodes. The firstexample in the first line shows an example elastic representation VLDWfor “medium size” and the second example on the second line shows theexample elastic representation VLDW for “smaller than medium size.” Thethird example on the third line shows the example elastic representationVLDW for “larger than medium size” and the last example on the last lineshows the example elastic representation VLDW for a “much larger thanmedium size.”

Many modifications and other embodiments of the invention will come tothe mind of one skilled in the art having the benefit of the teachingspresented in the foregoing descriptions and the associated drawings.Therefore, it is understood that the invention is not to be limited tothe specific embodiments disclosed, and that modifications andembodiments are intended to be included within the scope of the appendedclaims.

1. A coprocessor, comprising: a memory configured to store a pluralityof Very Long Data Words, each comprising a test Very Long Data Word(VLDW) having a length in the range of about one thousand bits to onemillion or more bits and containing encoded information that isdistributed across the length of the VLDW; a processor configured togenerate search terms; and a processing logic unit configured to:receive a test VLDW from the memory; receive a search term from theprocessor; and compute a Boolean inner product between the search termand the test VLDW read from memory indicative of the measure ofsimilarity between the test VLDW and the search term, wherein theprocessing logic unit comprises one or more pipeline Boolean logiccircuits that compute the Boolean inner product.
 2. The coprocessor ofclaim 1, further comprising a buffer configured to store the Booleaninner products resulting from the computation between each search termand the test VLDW together with the address in memory from which thetest VLDW was read.
 3. The coprocessor of claim 1, further comprising aDirect Memory Access (DMA) controller connected to the processor andmemory and configured to address and control the transfer of test VLDWsfrom memory to the processing logic unit.
 4. The coprocessor of claim 1,wherein said processor includes a conventional CPU interface forcommunicating with external devices, wherein said processor isconfigured to receive VLDWs as a plurality of 64-bit words via theconventional CPU interface and reformat the 64-bit words into a testVLDW having a length of about one thousand bits to at least one millionbits.
 5. The coprocessor of claim 1, comprising an external sensorconnected to the coprocessor and configured to generate sensor data,said processor is configured to receive the sensor data and generate asensed data VLDW from the sensor data, wherein said processing logicunit is configured to compare the sensed data VLDW to a search term. 6.The coprocessor of claim 1, wherein said processor is configured togenerate a plurality of test VLDWs, and said processing logic unitincludes a processing buffer into which the plurality of test VLDWs arebuffered.
 7. The coprocessor of claim 1, wherein said processor includesa serial interface and digital logic, wherein said serial interfacepasses serial data to said digital logic to reformat the serial datainto very long data words.
 8. A coprocessor, comprising: a memoryconfigured to store a plurality of Very Long Data Words, each comprisinga test Very Long Data Word (VLDW) having a length in the range of aboutone thousand bits to one million or more bits and containing encodedinformation that is distributed across the length of the VLDW; aprocessor configured to generate search terms; and a processing logicunit configured to: receive a test VLDW from the memory; receive asearch term from the processor; and compute a Boolean inner productbetween the search term and the test VLDW read from memory indicative ofthe measure of similarity between the test VLDW and the search term,wherein the processor is configured to perform calculations at a clockrate and compute a Boolean inner product between the search term and thetest VLDW at a latency to obtain results after multiple clocks.
 9. Thecoprocessor of claim 8, further comprising a buffer configured to storethe Boolean inner products resulting from the computation between eachsearch term and the test VLDW together with the address in memory fromwhich the test VLDW was read.
 10. The coprocessor of claim 8, furthercomprising a Direct Memory Access (DMA) controller connected to theprocessor and memory and configured to address and control the transferof test VLDWs from memory to the processing logic unit.
 11. Thecoprocessor of claim 8, wherein said processor includes a conventionalCPU interface for communicating with external devices, wherein saidprocessor is configured to receive VLDWs as a plurality of 64-bit wordsvia the conventional CPU interface and reformat the 64-bit words into atest VLDW having a length of about one thousand bits to at least onemillion bits.
 12. The coprocessor of claim 8, comprising an externalsensor connected to the coprocessor and configured to generate sensordata, said processor is configured to receive the sensor data andgenerate a sensed data VLDW from the sensor data, wherein saidprocessing logic unit is configured to compare the sensed data VLDW to asearch term.
 13. The coprocessor of claim 8, wherein said processor isconfigured to generate a plurality of test VLDWs, and said processinglogic unit includes a processing buffer into which the plurality of testVLDWs are buffered.
 14. The coprocessor of claim 8, wherein saidprocessor includes a serial interface and digital logic, wherein saidserial interface passes serial data to said digital logic to reformatthe serial data into very long data words.
 15. A coprocessor,comprising: a memory configured to store a plurality of Very Long DataWords, each comprising a test Very Long Data Word (VLDW) having a lengthin the range of about one thousand bits to one million or more bits andcontaining encoded information that is distributed across the length ofthe VLDW; a processor configured to generate search terms; and aprocessing logic unit configured to: receive a test VLDW from thememory; receive a search term from the processor; and compute a Booleaninner product between the search term and the test VLDW read from memoryindicative of the measure of similarity between the test VLDW and thesearch term, wherein the processing logic unit is configured to operateon successive test VLDWs compared against a search term.
 16. Thecoprocessor of claim 15, further comprising a buffer configured to storethe Boolean inner products resulting from the computation between eachsearch term and the test VLDW together with the address in memory fromwhich the test VLDW was read.
 17. The coprocessor of claim 15, furthercomprising a Direct Memory Access (DMA) controller connected to theprocessor and memory and configured to address and control the transferof test VLDWs from memory to the processing logic unit.
 18. Thecoprocessor of claim 15, wherein said processor includes a conventionalCPU interface for communicating with external devices, wherein saidprocessor is configured to receive VLDWs as a plurality of 64-bit wordsvia the conventional CPU interface and reformat the 64-bit words into atest VLDW having a length of about one thousand bits to at least onemillion bits.
 19. The coprocessor of claim 15, comprising an externalsensor connected to the coprocessor and configured to generate sensordata, said processor is configured to receive the sensor data andgenerate a sensed data VLDW from the sensor data, wherein saidprocessing logic unit is configured to compare the sensed data VLDW to asearch term.
 20. The coprocessor of claim 15, wherein said processor isconfigured to generate a plurality of test VLDWs, and said processinglogic unit includes a processing buffer into which the plurality of testVLDWs are buffered.